On the convergence of maximum variance unfolding 
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Abstract. Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality 
reduction. We study its large sample limit, providing specific rates of convergence under standard 
assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex 
subset, and we provide some simple examples where it fails to be consistent. 
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1 Introduction 

One of the basic tasks in unsupervised learning, aka multivariate statistics, is that of dimen- 
\^ sionality reduction. While the celebrated Principal Components Analysis (PCA) and Multidi- 

mensional Scaling (MDS) assume that the data lie near an affine subspace, modern approaches 
postulate that the data are in the vicinity of a submanifold. Many such algorithms have been 
proposed in the past decade, for example, ISOMAP (Tenenbaum et al., 2000), Local Linear Em- 
bedding (LLE) (Roweis and Saul, 2000), Laplacian Eigenmaps (Belkin and Niyogi, 2003), Manifold 
£SJ \ Charting (Brand, 2003), Diffusion Maps (Coifman and Lafon, 2006), Hessian Eigenmaps (HLLE) 

(Donoho and Grimes, 2003), Local Tangent Space Alignment (LTSA) (Zhang and Zha, 2004), Max- 
imum Variance Unfolding (Weinberger et al., 2004), and many others, some reviewed in (Saul et al., 
! 2006; Van der Maaten et al, 2008). 

Although some variants exist, the basic setting is that of a connected domain D C M. d isomet- 
rically embedded in Euclidean space as a submanifold McP, with p > al. We are provided with 
data points x\, . . . , x n S MP sampled from (or near) M and our goal is to output y±, . . . ,y n G M. d 
that can be isometrically mapped to (or close to) x±, . . . ,x n . 

A number of consistency results exist in the literature. For example, Bernstein et al. (2000) 
show that, with proper tuning, geodesic distances may be approximated by neighborhood graph 
distances when the submanifold M is geodesically convex, implying that ISOMAP asymptotically 
recovers the isometry when D is convex. When D is not convex, it fails in general (Zha and Zhang, 
2003). To justify HLLE, Donoho and Grimes (2003) show that the null space of the (continuous) 
Hessian operator yields an isometric embedding. See also (Ye and Zhi, 2012) for related results 
in a discrete setting. Smith et al. (2008) prove that LTSA is able to recover the isometry, but 
only up to an affine transformation. We also mention other results in the literature which show 
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that, as the sample size increases, the output the algorithm converges to is an explicit contin- 
uous embedding. For instance, a number of papers analyze how well the discrete graph Lapla- 
cian based on a sample approximates the continuous Laplace-Beltrami operator on a submanifold 
(Bclkin and Niyogi, 2005; Coifman and Lafon, 2006; Gine and Koltchinskii, 2006; Hein et al., 2005; 
Singer, 2006; von Luxburg et al., 2008), which is intimately related to the Laplacian Eigenmaps. 
However, such convergence results do not guaranty that the algorithm is successful at recover- 
ing the isometry when one exists. In fact, as discussed in detail by Goldberg et al. (2008) and 
Perrault-Joncas and Meila (2012), many of them fail in very simple settings. 

In this paper, we analyze Maximum Variance Unfolding (MVU) in the large-sample limit. We 
are only aware of a very recent work of Paprotny and Garcke (2012) that establishes that, under 
the assumption that D is convex, MVU recovers a distance matrix that approximates the geodesic 
distance matrix of the data. Our contribution is the following. In Section 2, we prove a convergence 
result, showing that the optimization problem that MVU solves converges (both in solution space 
and value) to a continuous version defined on the whole submanifold. The basic assumption here 
is that the submanifold Al is compact. In Section 3, we derive quantitative convergence rates, with 
mild additional regularity assumptions. In Section 4, we consider the solutions to the continuum 
limit. When D is convex, we prove that MVU recovers an isometry. We also provide examples 
of non-convex D where MVU provably fails at recovering an isometry. We also prove that MVU 
is robust to noise, which Goldberg et al. (2008) show to be problematic for algorithms like LLE, 
HLLE and LTSA. Some concluding remarks are in Section 5. 

2 From discrete MVU to continuum MVU 

In this section we state and prove a qualitative convergence result for MVU. This result applies 
with only minimal assumptions and its proof is relatively transparent. What we show is that the 
(discrete) MVU optimization problem converges to an explicit continuous optimization problem 
when the sample size increases. The continuous optimization problem is amenable to scrutiny with 
tools from analysis and geometry, and that will enable us to better understand (in Section 4) when 
MVU succeeds, and when it fails, at recovering an isometry to a Euclidean domain when it exists. 

Let us start by recalling the MVU algorithm (Weinberger et al., 2005, 2004; Weinberger and Saul, 
2006). We are provided with data points sci, . . . , x n G W. Let || • || denote the Euclidean norm. Let 
Xi,r be the (random) set defined by 

y n ,r = {yi, ■ ■ ■ ,Un G K p : \\yi - yj\\ < \\xi - Xj\\ when \\xi - Xj\\ < r} . 

Choosing a neighborhood radius r > 0, MVU solves the following optimization problem: 

Discrete MVU 

1 n 

Maximize £(¥):=- V V \\ Vi - yj \\ 2 , over Y = ( Vl , . . . , y n ) T G R"*? (1) 

n(n — 1) 

i=l j^i 

subject to Y G y n ,r- (2) 

When the data points are sampled from a distribution \i with support M, our main result in 
this section is to show that, when M is sufficiently regular and r = r n — > sufficiently slowly, the 
discrete optimization problem converges to the following continuous optimization problem: 

Continuum MVU 
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Maximize £(/):=[ \\f(x) - f(x')\\ 2 fi(dx)fi(dx'), over / : M -> W, (3) 

JMxM 

subject to / is Lipschitz with ||/||Lip < 1 5 (4) 

where ||/||Lip denotes the smallest Lipschitz constant of /. It is important to realize that the 
Lipschitz condition is with respect to the intrinsic metric on M (i.e., the metric inherited from the 
ambient space M p ), defined as follows: for x,x' £ M, let 

5 M (x,x') = inf{T : 3 7 : [0,T] -> M, 1-Lipschitz, with 7 (0) = x and -y(T) = x'}. (5) 

When M is compact, the infimum is attained. In that case, 5m(x,x') is the length of the shortest 
continuous path on M starting at x and ending at x', and (M, 5m) is a complete metric space, also 
called a length space in the context of metric geometry (Burago et al., 2001). Then / : M — > M p is 
Lipschitz with ||/||Lip 

< L if 

\\f(x)-f(x')\\ < LS M (x,x'), Vx,x' e M. (6) 

For any L > 0, denote by J 7 ^ the class of Lipschitz functions / : M — > W satisfying (6). 

One of the central condition is that M is sufficiently regular that the intrinsic metric on M is 
locally close to the ambient Euclidean metric. 

Regularity assumption. There is a non-decreasing function c : [0, oo) — > [0, oo) such that 
c(r) — > when r — > 0, such that, for all x, x' £ M, 

5 M (x,x>) < (l + c(||s-s'||))||x-a:'||. (7) 

This assumption is also central to ISOMAP. Bernstein et al. (2000) prove that it holds when M 
is a compact, smooth and geodesically convex submanifold (e.g., without boundary). In Lemma 4, 
we extend this to compact, smooth submanifolds with smooth boundary, and to tubular neighbor- 
hoods of such sets. The latter allows us to study noisy settings. 

Note that we always have 

\\x — x || < Sm(x, x ). (8) 

Let Si denote the set of functions that are solutions of Continuum MVU. We state the following 
qualitative result that makes minimal assumptions. 

Theorem 1. Let fj, be a (Borel) probability distribution with support M C MP, which is connected, 
compact and satisfying (7), and assume that sampled independently from /j,. Then, 

for r n — > sufficiently slowly, we have 

sup{£(Y) : Y e y n , r J -> sup{f (/) : / G Ji}, (9) 

and for any solution Y n = . . . , y n ) of Discrete MVU, 

) n | max \\m - /(x»)|| -> 0, (10) 

almost surely as n — >■ oo. 

Thus Discrete MVU converges to Continuum MVU in the large sample limit, if M satisfies 
the crucial regularity condition (7) and other mild assumptions. In Section 3, we provide explicit 
quantitative bounds for the convergence results (9) and (10) at the very end, under some additional 
(though natural) assumptions. In Section 4, we focus entirely on Continuum MVU, with the goal 
of better understanding the functions that are solutions to that optimization problem. Because 
of (10), we know that the output of Discrete MVU converges in a strong sense to one of these 
functions. 

The rest of the section is dedicated to proving Theorem 1. We divide the proof into several 
parts which we discuss at length, and then assemble to prove the theorem. 
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2.1 Coverings and graph neighborhoods 

For r > 0, let G r denote the undirected graph with nodes x±, . . . ,x n and an edge between Xj and 
xj if \\x{ — Xj\\ < r. This is the r-neighborhood graph based on the data. It is essential that G Tn 
be connected, for otherwise sup{£(y) : Y £ y n ,r n } = oo, while sup{£(/) : / € F{\ is finite. The 
latter comes from the fact that, for any f E J-\, 



£(/)< / <5 M (x,x / )V(dx)/i(dx / ) < diam(M) 5 
JMxM 



where we used (6) in the first inequality, and diam(M) is the intrinsic diameter of M, i.e., 

diam(M) := sup 5m{x,x). (11) 
x,x'eM 

Recall that the only assumptions on M made in Theorem 1 are that M is compact, connected, and 
satisfies (7), and this implies that diam(M) < oo. Indeed, as a compact subset of W, M is bounded, 
hence sup xx i £M \\x — x'\\ < oo. Reporting this in (7) immediately implies that diam(M) < oo. 
That said, we ask more of (r n ) than simply having G Tn connected. For rj > 0, define 

£l(rj) = {Vx 6 M,3i = 1,... ,n : ||x - x;|| < n}, (12) 

which is the event that xi, . . . , x n forms an //-covering of M. 
Connectivity requirement. r n — > in such a way that 

oo 

(r2(A n r n ) c ) < oo, for some sequence A n — > 0. (13) 

n=l 

Since M is the support of fi, there is always a sequence (r re ) that satisfy the Connectivity 
requirement. To see this, for rj > 0, let z\, . . . , zn be an repacking of M of maximal size iV^, i.e., a 
maximal collection of points such that \\zi — zj \\ > r] for all i ^ j. Recall that an //-packing is also 
an recovering of M and note that < oo by compacity of M. Let = mmj /i(B (zj , rj)) . Since 
M is the support of fj,, fi(B(z,r])) > for any z € M and any rj > 0, where B{z,rf) denotes the 
Euclidean ball centered at z and of radius ij > 0. Hence, > for any r/ > 0. We have 

P($7(2r/) c ) = P (there exists x £ M : Vz = 1,. .. ,n, ||x - x { || > 2r) ) 

< P(there is j such that B(zj,rj) is empty of data points) 

< y^P(-B(^j, rj) is empty of data points) 
i=i 

< N v (i- Pv r. 

Let ?? n = inf{r/ > : N v (l - p v ) n < 1/n 2 } ; the sequence 1/n 2 is chosen here for the simplicity of 
the exposition, but more general sequence can be considered, as will become apparent at the end 
of the paragraph. 

Since Prj > for all n > 0, r] n — > 0. To see this, let rf = diam(M). Clearly, for all rj > rf, p v = 1, 
which implies that the set of rj > such that A^(l — p v ) n < 1/n 2 is non-empty. In particular, for 
all n > 1, we have r) n < rf '. Now, let e > be fixed. Since p e > 0, there exists an integer n e such 
that N £ (l — p e ) n < 1/n 2 for all n > n E , so that i] n < e for all n > n £ . Since e is arbitrary, this 
proves that the sequence (rj n ) converges to as n tends to infinity. 
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With such a choice of (%), we have X^n>i ^(^(2f?n) c ) < Sn>i V n2 < oo. Therefore, if we 
take r n = ^/rf^, it satisfies the Connectivity requirement. In Section 3.2 we derive a quantitative 
bound on r n that guaranty (13) under additional assumptions. Note that the sequence (1/n 2 ) in 
the definition of rj n can be replaced by any summable decreasing sequence. 

The rationale behind the requirement on (r n ) is the same as in (Bernstein et al., 2000): it allows 
to approximate each curve on M with a path in G Tn of nearly the same length. We utilize this in 
the following subsection. 



2.2 Interpolation 

Assuming that the sampling is dense enough that £l{rj) holds, we interpolate a set of vectors Y G y n ,r 
with a Lipschitz function / £ JFi+oOq/r)- Formally, we have the following. 

Lemma 1. Assume that £l{rj) holds r/ < r/4. Then any vector Y = (yi, . . . ,y n ) £ y n ,r is of the 
form Y = (/(xi),.. .,/(%)) for some f E T 1+6v / r . 

We prove this result. The first step is to show that this is at all possible in the sense that 

WVi-VjW < (1 + 6r)/r)6 M (xi,Xj), Vi,j. (14) 

This shows that the map g : {x±, . . . , x n } — > MP defined by g{xi) = yi for all i, is Lipschitz (for 8m 
and the Euclidean metrics) with constant L = 1 + 6?//?'. We apply a form of Kirszbraun's Extension 
- (Lang and Schroeder, 1997, Th. B) or (Brudnyi and Brudnyi, 2012, Th. 1.26) — to extend g to 
the whole M into / € Fi+en/r- 

Therefore, let's turn to proving (14). The arguments are very similar to those in (Bernstein et al., 
2000). If 8M{xi,Xj) < r, then, by (8), — Xj\\ < r, which implies that 

Wm - Uj\\ < \\xi - Xj\\ < 8M{xi,Xj). 

Now suppose that 8M( x i, x j) > r - Let 7 be a path in M connecting xi to Xj of minimal length 
I = $M( x ij x j)- Split 7 into arcs of lengths 1% = r/2 plus one arc of length ijv+l < h, so that 

I I 

Denote by xi = x' , x[, . . . , x' N ,x' N+1 = Xj the extremities of the arcs along 7. 

For k = 1, . . . , N, let tk € argmin t \\x' k — xt\\. On £l n (rj), 8M( x 'k, x t k ) < V for all k, so that 

\\x tk -xt^W < SMixt^xt^) < 8 M (x' k ,x' k _ 1 ) + 2 V <h + 2 V < r/2 + 2(r/4) = r. 
Hence, because Y = (yi . . . , y n ) G y U)r , 

\\Vtk -Vtk-i\\ <h + 2r]. 

Similarly, for the last arc, recalling that x t N+1 = x j, we have 8M( x j, x t N ) = In+i + r]<li+r]<r, 
and therefore 

\\yt N +i -Vt N \\ < In+i + V- 

Consequently, 

hi-VjW < N(h+2r]) + (l N+1 +i 1 ) 
= Nh + l N+1 + {2N + l)r) 
= l+{2N + l)r). 
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We have 

(2N + l)r ] <(2l + l) V <l¥- = l^, 
V h J h r 

and so (14) holds. 

2.3 Bounds on the energy 

We call £ the energy functional. For a function / : {x\, . . . , x n } — > W , let Y n (f) = (f(x\), . . . , f(x n )) T S 
R nxp . Assume that £l(r]) holds r/ < r/4. Then Lemma 1 implies that any Y £ y n ,r is equal to Y(f) 
for some / £ J^+g^/j.. Hence, 

sup £(Y)< sup 5(y„(/)). (15) 

^€y?T,,r feT l + 6rj/r 

Recall the function c(r) introduced in (7), and assume that r > is small enough that c(r) < 1. 
For / G F\- C {r)i an d f° r an y J such that — Xj\\ < r, we have 

\\f(xi) - f(xj)\\ < (1 - c(r))M^,^) < (1 - c(r))(l + cdlxi - xj||))||xi - sj||. 

Since the function c is non-decreasing, c(|| X% X j ||) < c(r), and so 

ll/fai) - f( x j)\\ < (l - c ( r ) 2 ) Iki - ^'11 < ll^i - ^ill- 
Consequently, Y n (f) £ y n .r, implying that 

sup £{Y)> sup £{Y n (f)). (16) 

As a result of (15) and (16), we have 

| sup £{Y) - sup £{f)\ < sup | sup £(Y n (f)) ~ sup £(f)\. (17) 

yey„, r feFi l~c(r)<L<l+6 v /r fen fen 

We have 

| sup £(Y n (f)) ~ sup £{f)\ < sup |f(F n (/))-f(/)|, 
fen fen fen 

and applying the triangle inequality, we arrive at 

| sup £{Y n {f)) - sup £(f)\ < sup \£(Y n (f))-£(f)\ + | sup £(f) - sup £(f)\. 
fen fen fen fen fen 

Since = LJ 7 ! and £(Lf) = L 2 £(f), we have 

| sup £(/) - sup £(f)\ < \L 2 - 1| sup £{f ) < \L 2 - 1| diam(M) 2 , 
fen fen fen 

and 

sup |5(y„(/))-f(/)| =l 2 sup |f(y„(/))-f(/)|. (is) 

/e^L fen 

Consequently, 

| sup £(y„(/)) - sup £(f)\ < L 2 sup \£(Y n (f)) - £(f)\ + \L 2 - 1| diam(M) 2 . 

feJ~L fen fen 
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Reporting this inequality in (17) on the event £l(r]) with r\ < r/4, we have 

I sup £(Y)- sup £(f)\< (l+677/r) 2 sup |f(y n (/))-,S(/)|+/3(r,r ? )(2+/3(r,7 ? ))diam(M) 2 , (19) 
Yey n , r feTi feFi 

where /3(r, 77) := max(c(r), 677/r). 

Finally, we show that £ is continuous (in fact Lipschitz) on T\ for the supnorm. For any / and 
g in jF\, and any x and s' in M, we have: 

|||/0r) - f(x')\\ 2 - \\g{x) - g{x')f\ < \\f(x) - f{x>) - g(x) + g(x')\\\\f(x) - f{x')+g(x) - g(x')\\ 

< [\\f(x)-g(x)\\ + \\f(x')-g(x')\\] 
x [\\f(x)-f(x')\\ + \\g(x)-g(x')\\] 

< 4||/- 5 || 00 diam(M). 

The first inequality is that of Cauchy-Schwarz. Hence, 

\£(f)-£(g)\ <4||/- 5 || 00 diam(M), (20) 

and 

\£{Y n {f)) - £{Y n {g))\ <4||/- 5 || 0O diam(M). (21) 
2.4 More coverings and the Law of Large Numbers 

The last step is to show that the supremum of the empirical process (18) converges to zero. For 
this, we use a packing (covering) to reduce the supremum over T\ to a maximum over a finite set 
of functions. We then apply the Law of Large Numbers to each difference in the maximization. 
Fix xq G M and define 

J? = {/ G Ji : /(so) = 0}. 

Note that / G T\ if, and only if, / — /(so) £ -^i > an d by the fact that £(f + a) = £(/) for any 
function or vector / and any constant o 6 K p , we have 

sup \£(Y n (f))-£(f)\ = sup \£(Y n {f)) - £(f)\. 
ftt ft?? 

The reason to use J 7 ® is that it is bounded in supnorm. Indeed, for / G J^, we have 

||/(s)|| = ||/(s) - /(so)|| < 5m(x,x ) < diam(M), Vs G M. 

Let Moo{J-i,e) denote the covering number of J-® for the supremum norm, i.e., the minimal 
number of balls that are necessary to cover J 7 ^, and let /1, . . . , /jv £ F\ he an e-covering of J 7 ® 
of minimal size N := A/ r 00 (^ r i , e)- Since is equicontinuous and bounded, it is compact for the 
topology of the supremum norm by the Arzela-Ascoli Theorem, so that N^F® , e) < 00 for any 
e > 0. 

Fix / G Fi and let k be such that ||/ - f k \\ < e. By (20) and (21), we have 

\£(Y n (f)) - £{f)\ < \£(Yn(f)) - £{Yn{fk))\ + \£{Y n (h)) - £(fk)\ + \£(fk) - £(f)\ 

< 8diam(M)||/ - + \£(Y n (fk)) ~ £(fk)\ 

= 8diam(M) £ + \£(Y n (f k )) - £(f k )\ . 
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Thus, 

sup \£{Y n (f)) - £(f)\ <8diam{M)e + max{\£(Y n (f k ))-£(f k )\ : k = 1, . . . , e)}. (22) 
feJh 

The Law of Large Numbers (LLN) imply that, for any bounded /, £(Y n (f)) —> £(f), almost 
surely as n — > oo. Indeed, 

£(Y n (f)) = -j^—r-^Y.Wf^-f^W 2 
n(n — 1) n z z — ' 



h3 



2n 
n — 1 



-> 2E||/(x)|| 2 - 2||E/(x)|| 2 = £(/), almost surely as n -> oo, 

by the LLN applied to each term. Therefore, when e > is fixed, the second term in (22) tends to 
zero almost surely, and since e > is arbitrary, we conclude that 

sup \£{Y n (f)) — £ (/) — > 0, in probability, as n — > oo. (23) 



2.5 Large deviations of the sample energy 

To show an almost sure convergence in (23), we need to refine the bound on the supremum of the 
empirical process (18). For this, we apply Hoeffding's Inequality for U-statistics (Hoeffding, 1963), 
which is a special case of (de la Pena and Gine, 1999, Thm. 4.1.8). 

Lemma 2 (Hoeffding's Inequality for U-statistics). Let (J) : M x M — > R be a bounded measurable 
map, and let {xi : i > 1} be a sequence of i.i.d. random variables with values in M. Assume that 
K[(j)(xi,X2)] = and that b := ||^||oo < oo, and let a 2 = Var(<^(a;i, ^2))- Then, for all t > 0, 



— y 



{x% 5 Xj ) I> t 



< 



exp 



nt l 



5a 2 + 3bt 



n(n — 1) ^-^ 

Let / G T\. To bound the deviations of £(Y n (f)), we apply this result with (j)(x,x') = \\f(x) 
f(x')\\ 2 -£{f). Then, 



£{Y n {f))-£{f) 



n(n — 1) ^-^ 



\Xi , Xj j 



By construction, E[<^>(xi, £2)] = 0. Since / is Lipschitz with constant 1, for any x and x' in M, 
\\f{x)-f{x')\\ 2 < diam(Af) 2 &nd£ (/) < diam(M) 2 . Hence |H|oo < diam(M) 2 , and Var^i, x 2 )) < 
< diam(M) 4 . Applying Lemma 2 (twice), we deduce that, for any e > 0, 

.2 



'(|£(Y n (/))-£(/)|>e)<2exp 



ne 



5 diam(Af) 4 + 3 diam(M) 2 e 
Using (24) in (22), coupled with the union bound, we get that 



sup \£(Y n (f))-£(f)\ > 9ediam(M) < AU-^iV) ■ 2 exp 
fen J 



ne 



5 diam(M) 2 + 3s 



(24) 



(25) 



Clearly, the RHS is summable for every e > fixed, so the convergence in (23) happens in fact 
with probability one, that is, 

sup \£(Y n (f)) — £{f)\ — > 0, almost surely, as n — > 00. (26) 
fen 
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2.6 Convergence in value: proof of (9) 



Assume r n satisfies the Connectivity requirement, and that n is large enough that max(c(r n ), 6A n ) < 
1. When Q(X n r n ) holds, by (19), we have 

| sup £{Y)- sup£(/)| < (l + 6A n ) 2 sup \£(Y n (f)) - £(f)\ + 3max (c(r n ), 6A n ) diam(M) 2 , 
Yey n , r feJh feJh 

while when f2(A n r n ) does not hold, since the energies are bounded by diam(M) 2 , we have 

sup £(Y) - sup < 2diam(M) 2 . 

Yey n , r fefi 

Combining these inequalities, we deduce that 

| sup £(Y) - sup £(f)\ < 3 max (c(r n ), 6A n ) diam(M) 2 I n(Anrn) + 2 diam(M) 2 I n(Anrn)c 
Yey n , r feTi 

+(1 + 6A n ) 2 su P/e ^ \£{Y n (f)) - £{f)\. (27) 

Almost surely, the sum of the first two terms on the RHS tends to by the fact that c(r) — > 
when r — > 0, and (13) since v n satisfies the Connectivity requirement. The third term tends to 
by (23). Hence, (9) is established. 



2.7 Convergence in solution: proof of (10) 

Assume r n satisfies the Connectivity requirement, and that n is large enough that A n < 1/2. Let 
Y n denote any solution of Discrete MVU. When Q(X n r n ) holds, there is f n G F\ + Q\ n such that 
Y n = Y n (f n ). Note that the existence of the interpolating function f n holds on £l(\ n r n ) for each 
fixed n, and that this does not imply the existence of an interpolating sequence (/ n )n>i- That 
said, for each cj in the event liminf n il(A n r n ), there exists a sequence f n (.;u) and an integer no(w) 
such that Y n = Y n (f n ) for all n > no(w), i.e., the sequence is interpolating a solution of Discrete 
MVU for all n large enough. In addition, when r n satisfies the Connectivity requirement, then 
P(limsup n f2(A n r n ) c ) = by the Borel-Cantelli lemma. Hence the event liminf n VL{\ n r n ) holds 
with probability one. 

In fact, without loss of generality, we may assume that /„ E -^"i+6A c ^4- Since J-J is 
equicontinuous and bounded, it is compact for the topology of the supnorm by the Arzela-Ascoli 
Theorem. Hence, any subsequence of f n admits a subsequence that converges in supnorm. And 
since J-® increases with L and J-® = r\L>iJ~®, any accumulation point of (/ n ) is in J-®. 

In fact, if we define 5° = S± n J 7 ®, then all the accumulation points of (f n ) are in Sf. Indeed, 
we have 

£(fn) = £(fn) ~ £{Y n (f n )) + £(Y n (f n )), 



with 



£(fn) ~ £(Y n (f n )) < SUp \£(Y n (f)) ~ £(f)\ -> 0, 

.fe.Fi 



by (23), and 

by (9), almost surely as n — > oo. Hence, if /oo = lini/% f nk , by continuity of £ on J 7 ^, we have 



£(Y n (fn)) = SUp £{Y) -)■ SUp S(f), 

Yey n , rn feFi 



£{h 



\\m£{f nk ) 

k 



sup £(/), 
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and given that /oo G , we have /oo E by definition. 

The fact that (/„) is compact with all accumulation points in Sf implies that 

inf - /Hoc -»> 0, (28) 

and since we have maxi<j< n — f(xi)\\ = \\f n {xi) — f(xi)\\ < \\fn — /||oo, this immediately implies 
(10). The convergence in (28) is a consequence of the following simple result. 

Lemma 3. Let (a n ) be a sequence in a compact metric space with metric 5, that has all its accu- 
mulation points in a set A. Then 

inf 5(a n , a) — > 0. 

Proof. If this is not the case, then there is e > such that, inf a6 ^4 5(a n , a) > e for infinitely many 
re's, denoted m < n% < ■ ■ ■ . The space being compact, (a n , k ) has at least one accumulation point, 
which is in A by assumption. However, by construction, (a nk ) cannot have an accumulation point 
in A. This is a contradiction. □ 



3 Quantitative convergence bounds 

We obtained a general, qualitative convergence result for MVU in the preceding section and now 
specify some of the supporting arguments to obtain quantitative convergence speeds. This will 
require some (natural) additional assumptions on /i and M. While the proof of a result like 
Theorem 1 is necessarily complex, we endeavored in making it as transparent and simple as we 
could. The present section is more technical, and the reader might choose to first read Section 4 to 
learn about the solutions to Continuum MVU, which imply consistency (and inconsistencies) for 
MVU clS cl dimensionality-reduction algorithm. 
We consider two specific types of sets M: 

• Thin sets. M is a (i-dimensional compact, connected, C 2 submanifold with C 2 boundary 
(if nonempty). In addition, M C M*, where M* is a d-dimensional, geodesically convex C 2 
submanifold. 

• Thick sets. M is a compact, connected subset that is the closure of its interior and has a C 2 
boundary. 

The ambient space is MP. Note that our results are equally valid for piecewise smooth sets. Thin 
sets are a model for noiseless data, where that the data points are sampled from a submanifold. 
Note that they may have holes and boundaries. And thick sets are a model for noisy data, where 
that the data points are sampled from the vicinity of a submanifold. 

An important example of thick sets are tubular neighborhoods of thin sets. For a set A C MP 
and r\ > 0, the 77-neighborhood of A is the set of points in MP within Euclidean distance 77 of A, 
and is denoted B(A,r]). The reach of a set A C MP is defined in (Federer, 1959) as the largest rj 
such that, for any x £ B(A,r]) there is a unique point a £ A closest to x. We denote by p(A) the 
reach of A. Note that any thin set A has positive reach, which bounds its radius of curvature from 
below. While for any thick set A, dA is a thin set without boundary, for any i] < p(A), B(A,ij) is 
a thick set, with boundary having reach > p(A) — rj. 

In what follows, C and denote constants that depend only on p and d, which may change 
with each appearance. 
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3.1 The regularity condition 

The first thing we do is specify the function c in (7). When M is a thin set, we define tm = 
min (p(M*), p(dM)} , where by convention p(0) = oo. And when M is a thick set, we let = 
p(dM). The following result seems valid when tm = p(M) in both cases, but the proof seems much 
more involved. 

Lemma 4. Whether M is a thin or a thick set, (7) is valid with 

4r 

C ( r ) = ~' S ir<r M /2} + 2{r>r M /2}- 

Proof. We borrow results from (Niyogi et al., 2008). Let x,x' G M such that ||x — < tm/2. 

First, suppose that M is thick. Consider the line segment joining these two points. If this 
segment is included in M, then 5m(x,x') = \\x — x'\\. Otherwise, it intersects dM in at least two 
points; among these points, let z be the closest to x and z' the closest to x'. Since dM has no 
boundary, it is geodesically convex, so that there is a geodesic on dM, denoted £, joining z and z' . 
(Niyogi et al., 2008, Prp. 6.3) applies since \\z - z'\\ < \\x - x'\\ < r M /2 < p(dM)/2, and p(8M) 
coincides with the condition number of dM as defined in (Niyogi et al., 2008) — and denoted by r 
there. Hence, if £ is the length of £, we have 



£ < P {dM) - p(dM)Jl - ^pjy < \\z ~ z'\\ + 4\\z - z'\\ 2 /r M , (29) 

using the fact that \/l — t > 1 — 1/2 — t 2 for all t G [0, 1] and r^i < p{dM). Let 7 be the path made 
of ^ concatenated with the segments [xz] and [z'x']. If L is the length of 7, we have 

L = \\x — z\\ + \\z — x || + £ 

< \\x — z\\ + \\z' — x'\\ + \\z — z'\\ + 4\\z — z'\\ 2 jru 

< \\x — x'\\ + 4\\x — x'\\ 2 /rM, 

using the fact that x, z, z', x' are in that order on the line segment joining x and x'. This concludes 
the proof when M is thick. 

When M is thin, we distinguish two cases. Either there is a geodesic joining x and x' , and 
(Niyogi et al., 2008, Prp. 6.3) is directly applicable. Otherwise, M is not geodesically convex. Let 
7* be a geodesic on M+ joining x and x'. Necessarily, it hits the boundary dM in at least two points. 
Let z, z', £ and £ be defined as before. We again have (29). Let {xz)± and (z'x')* denote the arcs 
along 7* joining x and z, and z' and x' , respectively. Applying (Niyogi et al., 2008, Prp. 6.3) to 
each arc, which is possible since rj\j < p(M ir ), we also have 

length((xz)*) < ||x - z|| + 4||x - z|| 2 /r M , length((z'x)*) < ||z' - x'|| + 4||z' - x'|| 2 /r M . 

Let 7 be the curve made of concatenating these two arcs and and let L denote its length. We 
have 

L = length((xz)*) + length((zV)*) + £ 

4||x-z|| 2 , , .,, 4||z'-x'|| 2 ,,, 4||z-2'|| 2 

< \\x - z\\ + — — + \\z' - x'\\ + — — + \\z - z'\\ + — — 

Tm r M r M 

... 4||x-x'|| 2 

< \\x — X + 



This concludes the proof when M is thin. □ 



11 



3.2 Covering numbers and a bound on the neighborhood radius 

At what speed can we have r n — > and still have (13) hold? This question is of practical importance, 
since the neighborhood radius may affect the output of MVU in a substantial way. Computationally, 
it is preferable to have r n small, so there are fewer constraints in (2). However, we already explained 
that r n needs to be large enough that, at the very minimum, the resulting neighborhood graph is 
connected. In fact, we required the stronger condition (13). 

To keep the exposition simple, we assume that fi is comparable to the uniform distribution on 
M, that is, we assume that there is a constant a > such that 

fi(B(x,rf)) > avo\ d (B(x,rj) DM), VxeM,Vr?>0, (30) 

where voLj denotes the ti-dimensional Hausdorff measure and d denotes the Hausdorff dimension 
of M. We need the following result. Let co d be the volume of the <i-dimensional unit ball. 

Lemma 5. Whether M is thin or thick, there is C > such that, for any rj <tm and any x G M, 

vol d (B(x,r])nM) > Crf. 

Proof. It suffices to prove the result for x G M \ dM and for rj small enough. 

Thick set. We first assume that M is thick. Take x G M and rj < tm- If dist(x,9M) > rj, then 
B{x,rf) C M and the result follows immediately. Otherwise, let u be the metric projection of x 
onto dM, and define z = x + (ry/4)(x — — u\\. By the triangle inequality, B{z,rj/4) C B{x, rj). 

Also, by (Federer, 1959, Th. 4.8), u is also the metric projection of z G M onto dM, so that 
dist(z, dM) = \\z — u\\ = \\x — u\\ + rj/4 > rj/A. And, necessarily, z G M, for otherwise the line 
segment joining z to x would intersect dM, and any point on that intersection would be closer to z 
than u is, which cannot be. Therefore, B{z,rj/A) C B{x,rj)C\M and the result follows immediately. 

Thin set. We now assume that M is thin. For y G M, let T y be the tangent subspace of M 
at y and let ir y denote the orthogonal projection onto T y . Because M is a C 2 submanifold, for 
every y G M, there is e y > such that n y is a C 2 diffeomorphism on K y := B{y,e y ) n M, with 
Tiy 1 being 2-Lipschitz on ir y (K y ) — the latter comes from the fact that D y n y is the identity map 
and z — > D z ir y is continuous. Since M is compact, there is yi, ■ ■ ■ ,y m G M, with m < oo, such 
that M C UjB(yj,Ej/2). Let e = mmje yj , which is strictly positive. Let y be among the yj's such 
that x G B(y,Ej/2). Assuming that rj < e/2, we have that B(x,rj) C B(y,Ej). Let U := B(y,Ej), 
K = K y , T = T y and ir = ix y for short. 

We first show that, if dM fl K ^ and W := 7t{dM n K), then p(W) > p(dM). Indeed, for 
any z, z' G K, we have 

d\st{ir{z')-TT(z),T:&n(W,%(z))) < dist(z / - z, Tan(5M, z)) < — — — -\\z' - zf, 

2p(dM) 

where the first inequality follows from the facts that Tan(W r , vr(z)) = 7r(Tan(9M, z)) and that n 
is 1-Lipschitz, and the second inequality from (Federer, 1959, Th. 4.18) applied to dM. In turn, 
(Federer, 1959, Th. 4.17) applied to W implies that p(W) > p(dM). 

We can now reason as we did for thick sets, but with a twist. To be sure, let a = tt(x) and notice 
that B(a,rj)C\T = ir(B(x,rj)) C tt(U) since B(x, rj) C U. Ifdist(a,W) > rj/2, B(a,rj/2)C\T C ir(K). 
If dist(a, W) < rj/2, let b be the metric projection of a onto W and define c = a+(rj/8)(a— 6)/||a— b\\. 
Arguing exactly as we did for thick sets, we have that B(c,rj/8) n T C B(a,rj/2) n ir(K). Let 
L = n~ x (B{c,rj/%) n T). Note that L C ■K~ l {B{a,i 1 /2) (~) T) (~) K C B{x,rj) n K C 5(x,r/) n M, 
since 7r is injective on K and 7T" 1 is 2-Lipschitz on 7r(K). In addition, since it is 1-Lipschitz on K, 
we have voLj(L) > voLj(7r(L)) = vold{B(c, 77/8) n T). This immediately implies the result. □ 
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When (30) is satisfied, and M is either thin or thick, we can provide sharp rates for r n . Just 
as we did in Section 2.1, we work with coverings of M. Let j\f(M,n) denote the cardinality of a 
minimal r/-covering of M for the Euclidean norm. 

Lemma 6. Suppose ij < r^j. When M is thick, 

M(M,7]) < Cvol p (M)r]- p ; 
and when M is thin and < a < p{M) , 

N{B{M,a),ri) < C voLj(M) max(<r, if- d rf p . 
The constant C depends only on p and d. 

Proof. Suppose M is thick and let Zi,... ,zn an 7/-packing of M of size N v := J\f(M,r)). Since 
B(zi,r)/2) n B(zj,rj/2) = when i ^ j, we have 

volp(M) >^2vol p (B( Zj ,r]/2) n M) > N ri C p r] p , 

3 

where C p is the constant in Lemma 5. The bound on follows. 

Suppose M is thin. When a < r]/4, let z\ , . . . , zjy v/4 an (ry/4)-packing of M. Then by the 
triangle inequality, B(M,a) C UjB(zj,rj/2), and therefore N (B (M , a) , rj) < N v / 4 . Clearly, it 
suffices now to focus on a > rj. Let zi,...,zn be an (?y/4)-packing of B(M,a — rj/4). Since 
B(zi,r]/8) n B(zj,r//8) = when i / j, and B( Zi ,r]/8) C B(M,a), we have 

vol p (B(M,a)) > ^vol p (B( Zj , v /8)) = Nuj p {r}/8f. 

3 

Hence, N < uj- x (ri/%)-* \ol p (B(M,a)). By Weyl's volume formula for tubes (Weyl, 1939), we 
have vol p (B(M,o~)) < C\ \o\ ( i{M)a p ~ d for a constant C\ depending on p and d. Since we have 
B(M, a) C UjB(zj,r)/2), we have j\f(B(M, a), rj) < N, and the result follows. □ 

We are now ready to take a closer look at (13). Let r/ n be defined as in Section 2.1. By (30) 
and Lemma 5, we have p v > C\arf ', and we have Af(M,r/) < C2r\~ d by Lemma 6, where C\ and 
C2 depend only on M. Hence, 

jV{M,n)(l - Pr ,) n < C 2 rf d (\ - C x arf) n < C 2 rf d e"^^ < \, 

when 

i] d > {dan)' 1 log (C 2 r,- d n 2 ). 
We deduce that any r n 3> rji := (log(n) /n) 1 / d satisfies (13) with any \ n — > such that A n 3> rji/r n . 

3.3 Packing numbers of Lipschitz functions on M 

It appears necessary to provide a bound for TVoo^i , rj). For this, we follow the seminal work of 
Kolmogorov and Tikhomirov (1961) on entropy bounds for classical functions classes (including 
Lipschitz classes). We provide details for completeness. 
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Lemma 7. For any M compact, connected subset of MP satisfying (7), there is a constant C such 
that 

logAU-^) < C(log(lA?) +M(M, V /C)), 

for all < rf < 1. 

In particular, if M is thin or thick, we have log Afoo(J-i, rf) < Cr]~ d by Lemma 6 and Lemma 7. 

Proof. Take < e < l/^/p and let C = 2^/p(2 + c(2)). For j = (ji,...,j p ) G IP, let Qj = 
ris=ibs e i Us + Let J = {j : nM/ 0}, which we see as a subgraph of the lattice for the 
2 p -nearest neighbor topology. 

Note that |J| < C\N(M, e). Indeed, let e\, . . . , e2v be the vertices of the unit hypercube of W 
and let Z s = e s + (2Z) P . Also, let Zq = (2Z) P . By construction, Z\, . . . , Z^v is a partition of . 
Therefore, there is s (say s = 1) such that | JC\Z S \ > \ J\/2 P . For each j G JC\Z\, pick x.,- G QjCiM. 
By construction, for any j ^ j' both in J D Z±, \\xj — Xj'\\ > 2e, so | J H Zj\ is smaller than the 
2e-packing number of M, which is smaller than the e-covering number of M. 

Note also that UjQj is connected because M is. Let tti,...,it£ be a sequence covering J and 
such that Q na and Qti-,,^ are adjacent. A depth-first construction gives a sequence n of length at 
most I < C2I J|, since each has a constant number (= 2 P ) of adjacent hypercubes. 

Let yi,...,y m be an enumeration of the e-grid (eZ n [— diam(M), diam(M)]) p . Note that 
w < C%e~ p and that, for each s there are at most C4 indices t such that — yt\\ < Co£- 

Consider the class Q of piecewise-constant functions g : M — ?• W of the form 5(0;) = for all 
x £ QjCiM and such that — y tk \\ < Cq£ when Qj and Qfc are adjacent. This is a subclass of the 
class of functions of the form g{x) = yt n ^ j - j for all x G Q^Q) and such that — yt n ^_^ || < CqE. 

The cardinality of the larger class is at most mC^ -1 , since there are m possible values for yt^ w and 
then, at each step along ir, there at most C4 choices. Therefore, 

log \G\ < logm + ^logd 

< log(C 3 ) + plog(l/e) + C 2 C±N(M, e) log(C 4 ) 

< C 5 (log(l/ £ )+AA(M, e )). 

For each j, choose Zj G QjHM. Take any / G J 7 } 3 . For each j, let i,- be such that — j/t- || < 

y/pe and let g be defined by g{x) = yt j for all x G Qj. Suppose and are adjacent, so that 
— Zk\\ < 2-y/pe < 2. By the triangle inequality, (6) and (7), we have 

\\yt 3 -vt k \\ < ll/(^-)-/(^)ll + ll^-/(%-)ll + llyt fc -/(^)ll 

< (l + c(\\zj - Zk\\))\\zj - z k \\ +y/pE + sJpe 

< (1 + c(2))2y/pe + 2VP£ 
= C e. 

so that g G £/. Moreover, for x G n M, 

|| 5 (x) - /(x)|| = \\y tj - f(zj)\\ + \\f(zj) - f(x)\\ <^e + (l + c(^e))^e < (2 + c(l))^e. 
The result follows from choosing e = r//((2 + c(l))^/p). □ 
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3.4 Quantitative convergence bound 

From (25) and Lemma 7, there is a constant C > such that 

P ^sup \£(Y n (f)) -£(f)\ > Cn-^ d+2 ^j < eM-n- {d+md+2) ). 

Using this fact in (27), together with Lemma 4 and the order of magnitude for r n derived in 
Section 3.2, leads to a bound on the rate of convergence in (9) via the Borel-Cantelli Lemma. 

Theorem 2. Suppose that M is either thin or thick, of dimension d, and that (30) holds. Assume 
that r n — > such that r n ^> r^ n := (log(n)/(a n)) l / d and take any a n — > oo. Then, with probability 
one, 

J 

| sup{£(y) : Y G y n ,r n } ~ sup{£(f) : / G < a n (r n + ^ + n" 1 /^), 

fn 

for n large enough. 

Unfortunately, we do not have a quantitative bound on the rate of convergence of the solutions 
in (10). 



4 Continuum MVU 



Now that we established the convergence of Discrete MVU to Continuum MVU, we study the latter, 
and in particular its solutions. We mostly focus on the case where M is isometric to a Euclidean 
domain. 

Isometry assumption. We assume that M is isometric to a compact, connected domain D C K d . 
Specifically, there is a bijection ip : M — > D satisfying 5d(iP(x),iP(x')) = 5m{x, x') for all x, x' G M. 

As a glimpse of the complexity of the notion of isometry, and also for further reference, consider 
a domain D as above. Then the canonical inclusion i of D in R d is not necessarily an isometry 
between the metric spaces (D,5d) and (M d , || • ||). To see this, let x and x' be two points of D. 
Let 7 be a shortest path connecting x to x' in D. Suppose that i : (D,6d) — > 0& d , \\ • ||) is an 
isometry. Then, L(t 07) = L(j) = 5d(x,x') = \\l(x) — i(x')\\. So the image path 1 o 7 is a shortest 
path connecting l{x) to t(x'), hence a segment. Since this segment lies in t(D) = D, and since this 
holds for any pair of points x,x' in D, this implies that D is convex. Conversely, if D is convex, 
the canonical inclusion 1 is an isometry. 

We start by showing that, in the case where M is isometric to a convex domain, then MVU 
recovers this convex domain modulo a rigid transformation, so that MVU is consistent is that case. 
The last part of the section is dedicated to a perturbation analysis that shows two things. First, 
that Continuum MVU changes slowly with the amount of noise, up to a point. And second, that 
when M is isometric to a domain that is not convex, MVU may not recover this domain. We 
provide some illustrative examples of that. 

In the following, we identify M d with R d x {0} p ~ d C W. 



4.1 Consistency under the convex assumption 

If we assume that D is convex, then MVU recovers D up to a rigid transformation, in the following 
sense. Recall that S\ is the solution space of Continuum MVU. 
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Theorem 3. Suppose that M is isometric to a convex subset D C M d with isometry mapping 
tp : M —> D, and that (30) holds. Then 

S 1 = {(o^ : C € Isom(]R p )}. 

Proof. Note first that, since D is convex, its intrinsic distance coincides with the Euclidean distance 
of M d , i.e., 8d = || ■ ||- For all / in T\, we have 

£(f) = [ \\f(x)-f(x')\\ 2 f,(dx)f,(dx') 

J MxM 

< / $m(x, x 1 ) 2 n(dx)n(dx') 

J MxM 

6 D (^(x),^(x')) 2 fj,(dx)fjL(dx') 

MxM 

||V>(x) - ip(x')\\ 2 fi(dx)fi(dx') 

MxM 



DxD 



\z - z'\\ 2 (n o V _1 )(d«)(/i o il>~ l )(dz'), 



while 



So 



£(i/j) = / \\z- z'\\ 2 (fio^ 1 )(dz)(fio^ 1 )(dz'). 



DxD 



sup £ (/) = S(i/>) = / \\z- z'\\ 2 (fjLo ^- 1 )(dz)(noip- 1 )(dz'). 

feTi J DxD 

Hence ip £ S\, and since £{Q o ifj) = £(if)) for any isometry £ : MP — > MP, 

{(oip : C G Isom(K p } C Si. 

Now let / : M — > MP be a function in .Fi so that ||/(x) — /(x')|| < 6m(x,x') for any points x 
and x' in M. Suppose that / is not an isometry. Then there exists two points x and x' in M such 
that 

\\f(x)-f(x')\\<5 M (x,x'). 

By continuity of /, there exists a nonempty open subset U of M x M containing (x, x') such that 
||/(2) - f(z')\\ < 5 M (z,z') for all (z,z') in [/. In addition, /j(U) > by (30). Consequently 

£{f) = [ ||/(x)-/(x')||V(dxMdx / )+ / ||/(x)-/(x / )||V(dxMdx / ) 

Jmxm\u Ju 

< / <5m(x, x') 2 /u(dx)//(dx') 
= sup £(/). 

So any function / in J 7 ! which is not an isometry onto its image does not belong to Si. 

At last, since for any isometry / in Si, the map / o tp -1 : MP — > MP is an isometry, there exists 
some isometry £ S Isom(lR ? ') such that / = £ o ip, and we conclude that 

{Co V : C e Isom(R p )} =Si. 

□ 

In conclusion, MVU recovers the isometry when the domain D is convex. Note that this is also 
the case of IS OMAR 
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4.2 Noisy setting 

When the setting is noisy, with noise level a > 0, X\tj . . . • Xfi 211*6 sampled from p a , a (Borel) 
probability distribution on W with support M a := B{M, a), i.e., M a is composed of all the points 
of MP that are at a distance at most a from M. To speak of noise stability, we assume that p a 
converges weakly when a — > 0. Let J 7 ! ]Cr denote the class of 1-Lipschitz functions on M a , and so 
on. Our simple perturbation analysis is plainly based on the fact that 5 is continuous with respect 
to the noise level, in the following sense. This immediately implies that MVU is tolerant to noise. 

Lemma 8. Let M C MP be of positive reach p{M) > and assume that p a — > po weakly when 
a — > 0. Then as a — > 0, we have 



and 



sup £ a (f) sup £(/), (31) 



sup inf sup inf ||/(x) - g(z)\\ -> 0. (32) 
/eSi, CT SG<Si xgM CT Z( ^ M 



Proof. The metric projection tt : B(M, p(M)) —> M with 7r(x) = argmin{||x — : x' G M}, is 
well-defined and 1-Lipschitz (Federer, 1959, Th. 4.8). 

Consider any sequence a m — > with a m < p(M) for all m > 1, and let / m G 5°^ . Let 
g rn denote the restriction of f m to M. Since (g m ) C and J 7 } 3 is compact for the supnorm, 
it admits a convergent subsequence. Assume (g m ) itself is convergent, without loss of generality. 
Then g m ->■ 5*, with 5* G J?. For x G B(M,p(M)), define /*(x) = S*(tt(x)). Then for x G M CTm , 
we have 

Il/*(a0-/m(s)|| < \\g^{x))-g m {TT(x))\\ + \\fMx))- f m (x)\\ 
< \\g* - g m \\oo + - x\\ 

— 1 1 9* ~ 9m 1 1 00 ~\~ 0~m 1 

since f m G ^i,o- ro and the segment [7r(x),x] C M Um . The latter is due to ||7r(x) — x|| < a m and 
B(n(x),a m ) C M CTm , both by definition. Hence, as functions on M CTm , we have ||/*(a;) — fm( x )\\oo 
0, i.e., 

sup -fm(x)\\ -)• 0. 

By (20), again applied to functions on M CTm for a fixed m, we have 

|^ m (/m)-^ m (A)| < 4||A(x)-/ m (x)|| 0O diam(M (Tm ) 

< 4||/*(s) - / m (s)|| 0O diam(5(Af,p(M))) 

-> 0, 

and since A does not depend on m and is bounded, we also have 

5 CTm (A) -> 5(A) = 5(9.) < sup£. (33) 

•Fi 



Hence 



SUP 5 CTm = £a m (fn 
■Fl.CTm 



= 5(A) + 5^ (A) - 5(A) + £ am (f m ) - £a m (A) 

< sup 5 + 5 CTm (A) - 5(A) + 5 CTm (/ m ) - 5 CTm (A), 
Fl 
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and we deduce that 

lim sup £ a < sup£, 

and since this is true for all sequences a m —> (and m large enough), we have 

lim sup ^ < sup£. 

For the reverse relation, choose g G Si and for x G B(M, p(M)) define f{x) = g(ir(x)). As 
above, let a m — > with a m < p(M). Then / G F\,a m by composition, so that 

£* m {f) < sup £ am . 

On the other hand, 

£(/) = £(<?)= sup £. 

Hence, 

sup £ < lim sup £ a . 

This concludes the proof of (31). 

Equation (32) is now proved based on (31) in the same way (10) is proved based on (9), by 
contradiction. To be sure, assume (32) is not true. Then it is also not true for S® a and Sf. Hence, 
there is e > 0, a sequence a m — > and f m £ 5? CTm such that 

inf sup M\\f m (x)-g(z)\\>e, 

for infinitely many m's. Without loss of generality, we assume this is true for all m. For each m, 
let g m be the restriction of f m to M. Then, taking a subsequence if needed, g m — >• g+ £ F± in 
supnorm. As before, define f*(x) = g±(ir(x)) for x G B(M,p(M)). Following the same arguments, 
we have 

sup - f m (x)\\ -> 0. 

x£M am 

We also see that, necessarily, g+ G S®, for otherwise the inequality in (33) would be strict and this 
would imply that (31) does not hold. Hence 

sup \\U(x)-f m (x)\\> sup ini \\f m (x)-g ir (z)\\>mi sup M \\f m (x)-g(z)\\. 

x€M CTm xeM am g<=S<{xeM am zeM 

This leads to a contradiction. Hence the proof of (32) is complete. □ 



4.3 Inconsistencies 

We provide two emblematic situations where MVU fails to recover D. They are both consequences 
of MVU's robustness to noise. In both cases, we consider the simplest situation where M = D C M? 
and p is the uniform distribution. Note that tp is the identity function in this case, i.e., ip(x) = x, 
and the Isometry Assumption is clearly satisfied. We use the same notation as in Section 4.2 and 
let pa- denote the uniform distribution on M a . 

Nonconvex without holes. Suppose Mq C M? is a curve homeomorphic to a line segment, but 
different from a line segment, and for a > 0, let M a be the (closed) cr-neighborhood of Mq. We 
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show that there is a numeric constant do > such that, when a < ctq, ip does not maximize the 
energy E a . To see this, we utilize Lemma 8 to assert that S\^ a — > Si^ in the sense of (32), and that 
if) $l Sifi, because Si^ is made of all the functions that map M to a line segment isometrically. So 
there is <jq > such that if) £ 5i i(T for all a < o~q. This also implies that no rigid transformation of 
M 2 is part of S\^ a . If we now let D = M = M a for some < a < <to, we see that we do not recover 
D up to a rigid transformation. 

Convex boundary and convex hole. Let K a denote the axis-aligned ellipse of M 2 with semi- 
major axis length equal to a and perimeter equal to 2tt. Note that, necessarily, 1 < a < tt/2, with 
the extreme cases being the unit circle (a = 1) and the interval [— 7r/2,7r/2] swept twice (o = vr/2). 
Denote by b = b(a) the semi-minor axis length of K a , implicitly defined by 

/ \J a 2 sin 2 t + b 2 cos 2 t dt = 2tt. 
Jo 

We have 

F(a) := / ||x|| 2 d3; = / (a 2 cos 2 1 + b 2 sin 2 1) \J a 2 sin 2 t + b 2 cos 2 1 dt. 
JK a Jo 

This daunting expression is much simplified when a = 1, in which case it is equal to 2ir, and when 
a = tt/2, in which case it is equal to 7r 2 /12. Since the former is larger than the latter, and F is 
continuous in a, there is a* such that, for a > a+, F{a) < F(l). (We actually believe that a* = 1.) 

Fix a G (a*,7r/2) and let M = K a = 0~ 1 (A' 1 ), where <f> : M. 2 — > R 2 sends x = (xi,x 2 ) to 
4>(x) = (xi/a, X2/6). Note that K\ is the unit circle. By the previous calculations and our choice 
for a, the identity function if) is not part of £1,0, since 

SoW = - [ \\xfdx = -F(a) < -F(l) =2 = - f \\<f>(x)\\ 2 dx = £ (<f>). 

K J M IT IT IT J Mq 

As before, let M a be the (closed) a- neighborhood of Mq. Again, there is a numeric constant 
do > such that, when a < era, if) does not maximize the energy £ a , and we conclude again that if 
D = M = M a , MVU does not recover D up to a rigid transformation. 

5 Discussion 

We leave behind a few interesting problems. 

• Convergence rate for the solution(s). We obtained a convergence rate for the energy in 
Theorem 2, but no corresponding result for the solution(s). Such a result necessitates a 
fine examination of the speed at which the energy decreases near the space of maximizing 
functions. 

• Flattening property of MVU. Assume that M satisfies the Isometry Assumption. Though we 
showed that MVU is not always consistent in the sense that it may not recover the domain 
D up to a rigid transformation, we believe that MVU always flattens the manifold M in this 
case, meaning that it returns a set S which is a subset of some ci-dimensional affine subspace. 
If this were true, it would make MVU consistent in terms of dimensionality reduction! 

• Solution space in general. As pointed out by Paprotny and Garcke (2012), and as we showed 
in Theorem 1, characterizing the solutions to Continuum MVU is crucial to understanding 
the behavior of Discrete MVU. In Theorem 3, we worked out the case where M is isometric 
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to a convex set. What can we say when M is isometric to a sphere? Is MVU able to recover 
this isometry? This question is non-trivial even when M is isometric to a circle. In fact, 
showing that the energy over ellipses (of same perimeter) is maximized for a circle is not 
straightforward, as seen in Section 4.3. 
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